Asymmetrically Weighted CCA And Hierarchical Kernel Sentence Embedding For Multimodal Retrieval
نویسندگان
چکیده
Joint modeling of language and vision has been drawing increasing interest. A multimodal data representation allowing for bidirectional retrieval of images by sentences and vice versa is a key aspect of this modeling. In this paper we show that a cross-view mapping of the search space to the query space achieves state of the art performance in bidirectional retrieval using off the shelf features. We show that this search to query mapping can be implemented by a simple asymmetric weighting of the canonical correlation weights, where the canonical weights of the search space are weighted by the canonical correlations. We revisit regularization for Canonical Correlation Analysis (CCA) within the framework of spectral filtering and the Bjork Golub algorithm. We present our regularized CCA algorithms within two spectral filtering regularization families: Tikhonov regularization and truncated SVD (T-SVD) regularization . T-SVD regularization is new to the best of our knowledge in the context of CCA and is faster to cross-validate than Tikhonov regularized CCA at the price of a small loss in accuracy. In light of the spectral filtering interpretation of the regularizer, we devise a fast method to choose the regularization parameter in the Tikhonov regularized CCA, using the regularization path of T-SVD CCA and the singular values of the data matrices.
منابع مشابه
Cross-Lingual Word Representations via Spectral Graph Embeddings
Cross-lingual word embeddings are used for cross-lingual information retrieval or domain adaptations. In this paper, we extend Eigenwords, spectral monolingual word embeddings based on canonical correlation analysis (CCA), to crosslingual settings with sentence-alignment. For incorporating cross-lingual information, CCA is replaced with its generalization based on the spectral graph embeddings....
متن کاملJointly Modeling Deep Video and Compositional Text to Bridge Vision and Language in a Unified Framework
Recently, joint video-language modeling has been attracting more and more attention. However, most existing approaches focus on exploring the language model upon on a fixed visual model. In this paper, we propose a unified framework that jointly models video and the corresponding text sentences. The framework consists of three parts: a compositional semantics language model, a deep video model ...
متن کاملSteganography Scheme Based on Reed-Muller Code with Improving Payload and Ability to Retrieval of Destroyed Data for Digital Images
In this paper, a new steganography scheme with high embedding payload and good visual quality is presented. Before embedding process, secret information is encoded as block using Reed-Muller error correction code. After data encoding and embedding into the low-order bits of host image, modulus function is used to increase visual quality of stego image. Since the proposed method is able to embed...
متن کاملUsing Weighted Distributions for Modeling Skewed, Multimodal and Truncated Data
When the observations reflect a multimodal, asymmetric or truncated construction or a combination of them, using usual unimodal and symmetric distributions leads to misleading results. Therefore, distributions with ability of modeling skewness, multimodality and truncation have been in the core of interest in statistical literature, always. There are different methods to contract ...
متن کاملMulti-View Canonical Correlation Analysis
Canonical correlation analysis (CCA) is a method for finding linear relations between two multidimensional random variables. This paper presents a generalization of the method to more than two variables. The approach is highly scalable, since it scales linearly with respect to the number of training examples and number of views (standard CCA implementations yield cubic complexity). The method i...
متن کامل